Goto

Collaborating Authors

 low-pass filter




Enhancing DPSGD via Per-Sample Momentum and Low-Pass Filtering

arXiv.org Artificial Intelligence

Differentially Private Stochastic Gradient Descent (DPSGD) is widely used to train deep neural networks with formal privacy guarantees. However, the addition of differential privacy (DP) often degrades model accuracy by introducing both noise and bias. Existing techniques typically address only one of these issues, as reducing DP noise can exacerbate clipping bias and vice-versa. In this paper, we propose a novel method, \emph{DP-PMLF}, which integrates per-sample momentum with a low-pass filtering strategy to simultaneously mitigate DP noise and clipping bias. Our approach uses per-sample momentum to smooth gradient estimates prior to clipping, thereby reducing sampling variance. It further employs a post-processing low-pass filter to attenuate high-frequency DP noise without consuming additional privacy budget. We provide a theoretical analysis demonstrating an improved convergence rate under rigorous DP guarantees, and our empirical evaluations reveal that DP-PMLF significantly enhances the privacy-utility trade-off compared to several state-of-the-art DPSGD variants.


Grassmanian Interpolation of Low-Pass Graph Filters: Theory and Applications

arXiv.org Artificial Intelligence

Low-pass graph filters are fundamental for signal processing on graphs and other non-Euclidean domains. However, the computation of such filters for parametric graph families can be prohibitively expensive as computation of the corresponding low-frequency subspaces, requires the repeated solution of an eigenvalue problem. We suggest a novel algorithm of low-pass graph filter interpolation based on Riemannian interpolation in normal coordinates on the Grassmann manifold. We derive an error bound estimate for the subspace interpolation and suggest two possible applications for induced parametric graph families. First, we argue that the temporal evolution of the node features may be translated to the evolving graph topology via a similarity correction to adjust the homophily degree of the network. Second, we suggest a dot product graph family induced by a given static graph which allows to infer improved message passing scheme for node classification facilitated by the filter interpolation.


Data Unlearning Beyond Uniform Forgetting via Diffusion Time and Frequency Selection

arXiv.org Artificial Intelligence

Data unlearning aims to remove the influence of specific training samples from a trained model without requiring full retraining. Unlike concept unlearning, data unlearning in diffusion models remains underexplored and often suffers from quality degradation or incomplete forgetting. To address this, we first observe that most existing methods attempt to unlearn the samples at all diffusion time steps equally, leading to poor-quality generation. We argue that forgetting occurs disproportionately across time and frequency, depending on the model and scenarios. By selectively focusing on specific time-frequency ranges during training, we achieve samples with higher aesthetic quality and lower noise. Finally, to evaluate both deletion and quality of unlearned data samples, we propose a simple normalized version of SSCD. Together, our analysis and methods establish a clearer understanding of the unique challenges in data unlearning for diffusion models, providing practical strategies to improve both evaluation and unlearning performance. The ability to remove the influence of training samples from a learned model, often referred to as machine unlearning (Bourtoule et al., 2021), has become increasingly important. Regulatory frameworks such as the "right to be forgotten" in the General Data Protection Regulation (GDPR) by the European Union and growing concerns about sensitive or proprietary data have created demand for methods that allow models to forget without costly retraining from scratch. Recently, with the development of generative models such as diffusion models (Ho et al., 2020), unlearning the unsafe concept or memorization has been actively explored through training-free sampling (Kim et al., 2025), output filtering (Y oon et al., 2025), and fine-tuning (Wang et al., 2025a).


DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction Xinwei Zhang University of Southern California Zhiqi Bu

Neural Information Processing Systems

Privacy is a growing concern in modern deep-learning systems and applications. Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from the trained machine learning models. DP op-timizers, including DP stochastic gradient descent (DPSGD) and its variants, privatize the training procedure by gradient clipping and DP noise injection. However, in practice, DP models trained using DPSGD and its variants often suffer from significant model performance degradation. Such degradation prevents the application of DP optimization in many key tasks, such as foundation model pre-training.



Learning from Heterophilic Graphs: A Spectral Theory Perspective on the Impact of Self-Loops and Parallel Edges

arXiv.org Artificial Intelligence

Graph heterophily poses a formidable challenge to the performance of Message-passing Graph Neural Networks (MP-GNNs). The familiar low-pass filters like Graph Convolutional Networks (GCNs) face performance degradation, which can be attributed to the blending of the messages from dissimilar neighboring nodes. The performance of the low-pass filters on heterophilic graphs still requires an in-depth analysis. In this context, we update the heterophilic graphs by adding a number of self-loops and parallel edges. We observe that eigenvalues of the graph Laplacian decrease and increase respectively by increasing the number of self-loops and parallel edges. We conduct several studies regarding the performance of GCN on various benchmark heterophilic networks by adding either self-loops or parallel edges. The studies reveal that the GCN exhibited either increasing or decreasing performance trends on adding self-loops and parallel edges. In light of the studies, we established connections between the graph spectra and the performance trends of the low-pass filters on the heterophilic graphs. The graph spectra characterize the essential intrinsic properties of the input graph like the presence of connected components, sparsity, average degree, cluster structures, etc. Our work is adept at seamlessly evaluating graph spectrum and properties by observing the performance trends of the low-pass filters without pursuing the costly eigenvalue decomposition. The theoretical foundations are also discussed to validate the impact of adding self-loops and parallel edges on the graph spectrum.



ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Y u Chen

Neural Information Processing Systems

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing methods do not scale well to large scale distributed systems (due to gradient build-up) and/or fail to evaluate model fidelity (test accuracy) on large datasets.